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Abstract —Blind Source Separation is a widely used technique 
to analyze multichannel data. In many real-world applications, its 
results can be significantly hampered by the presence of unknown 
outliers. In this paper, a novel algorithm coined rGMCA (robust 
Generalized Morphological Component Analysis) is introduced 
to retrieve sparse sources in the presence of outliers. It explicitly 
estimates the sources, the mixing matrix, and the outliers. It 
also takes advantage of the estimation of the outliers to further 
implement a weighting scheme, which provides a highly robust 
separation procedure. Numerical experiments demonstrate the 
efficiency of rGMCA to estimate the mixing matrix in comparison 
with standard BSS techniques. 

Index Terms —Blind source separation, sparse representations, 
sparsity, robust recovery, outliers. 

I. Introduction 

HE advances in multichannel technologies are exploited 
in various fields such as astrophysics (l| or hyperspectral 
remote-sensing (2). This has generated a considerable interest 
for methods able to extract the relevant information from these 
specific data. Blind Source Separation (BSS) is one of them. 
In this context, the m noisy observations {X^} i=1 m are 
assumed to be the linear mixture of n < m sources {S^ } J=1 n 
with t > m samples. The presence of instrumental noise and 
model imperfections is usually admitted and represented by 
a Gaussian noise N. The BSS techniques aim to estimate A 
and S from X. This problem is well-known to be ill-posed 
as the number of solutions is infinite. This requires imposing 
further prior information to recover the original sources such 
as the statistical independence of the sources (ICA methods 
0 and references therein) or the sparsity of s g-@. 

Most of the techniques in BSS are highly sensitive to the 
presence of spurious outliers while these ones are frequent in 
real-world data |7j. Indeed in many applications, the data are 
additionally corrupted by a few, and therefore sparse, large 
errors that cannot be modeled by standard additive Gaussian 
noise: this includes stripping noise |[8), impulsive noise 0. 
glitches 110] to name only a few. In the next, we will consider 
that the observations can be expressed as: 

X = AS + O + N, 

where X G R mxt stands for the observations, A G R mxn 
the mixing matrix, S G R nxt the sources, O G R mxt the 
outliers, and N G R mxn the Gaussian noise. 

To be best of our knowledge, only few blind source 
separation methods have been proposed for coping with 
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outliers. So far, the state-of-the-art techniques can be divided 
into three groups. The first approach consists in replacing the 
data fidelity term in the standard methods by some divergence 
that provides more robustness to the presence of outliers, 
without estimating them explicitly jTTJ, fl2) . In particular, 
the use of the /^-divergence, which is a generalization of 
the Kullback-Leibler divergence, has received a growing 
attention. In G3 a general BSS method using statistical 
independence and minimizing the /3-divergence has been 
proposed. The numerical experiments in ED show that 
by finding an appropriate value of /3, the estimation of A 
becomes quite robust to the outliers. However, the selection 
of /3 is a challenging task in practice and this method only 
estimates the mixing matrix: the sources and the outliers 
remain unmixed. 

The second scheme proceeds in a two-steps strategy that 
consists in: i) pre-processing the data to discard the outliers, 
and ii) performing source separation on the pre-processed 
estimate of the data AS. Several outlier removal methods 
that exploit the low-rank property (m ^ n) of AS have 
been recently proposed, especially in hyperspectral imaging 
GT G3- Indeed, it has been shown that it is possible to 
separate AS and O with a high accuracy if AS has low-rank 
and if the outliers are sparsely distributed and additionally 
assumed to be in general position (they do not cluster in 
specific directions) (16} PCP]. However, these assumptions are 
generally not met in practice, in particular when the number 
of observations is close to the number of sources. The major 
drawback of this approach is that an inaccurate estimation 
of the mixture AS will very likely hamper the separation 
process. Moreover, the parameters of these methods need to 
be tuned to return a sufficiently accurate estimation of AS 
prior to performing the separation. 

Last, the third strategy consists in estimating jointly the 
sources, the mixing matrix and the outliers G3, GD- 
This leads to a flexible framework in which the priors on 
the components can be individually taken into account. 
In |T7] for example, the authors use the /3-divergence for 
the data fidelity term X — AS — O, the £ 2,1 norm for O, 
which corrupts entirely some columns of X, to enforce 
its sparsity and the positivity prior for A and S. This 
third category has only been developed in the scope of 
NMF, where further enforcing the positivity of A and S 
is known to greatly enhance the separation process. Since 
non-negativity is not always a valid assumption in physical 
applications (e.g. astrophysics [ 1 ]), neither the mixing matrix 
nor the sources will be assumed to be non-negative in the next. 

Contribution: a novel robust algorithm for sparse BSS is 
proposed. To the best of our knowledge, no method using 
sparsity for tackling BSS problems in the presence of outliers 
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has been studied so far. Building upon the sparse BSS algo¬ 
rithm GMCA, our algorithm coined rGMCA (robust GMCA) 
estimates jointly A, S and O based on the sparsity of the out¬ 
liers and the sources. Besides, and apart from presuming that 
the outliers are sparse and in general position, the algorithm 
makes no assumption about the sparsity pattern of the outliers. 
This makes the proposed algorithm very general and suitable 
for various scenarios. 

The paper is organized as follows: Section II presents the 
basics of GMCA, Section III introduces rGMCA, and last, 
numerical experiments are performed to compare rGMCA with 
standard BSS methods in Section IV. 


II. Sparse BSS 


The sparsity prior has been shown to be an effective 
separation criterion for BSS 0, 0. Essentially, it assumes 
that the sources are sparse, which means that they can be 
described by few coefficients in a given dictionary <1>. In the 
present paper, we will assume that the sources are sparse in the 
sample domain: = I. The morphological diversity property 
(MDP - see 0) further assumes that the sparse sources do 
not share their largest entries. This allows to differentiate the 
sources from their most significant entries in <I>. The algorithm 
GMCA 15] takes advantage of the MDP to seek the sources S 
and the mixing matrix A from the observations X = AS + N 
by solving: 


. 1 

argrmn - 

A,S 2 


X-AS 




where the quadratic term is the data fidelity term, which is 
well suited to deal with Gaussian noise, and the p-norm with 
p < 1 enforces the sparsity of the recovered S. In practice, we 
customarily choose the convex norm. Whereas the above 
problem is not convex, it can be tackled efficiently with Block 
Coordinate Relaxation (BCR (T9)) and alternative projected 
least squares as follows: 

• Update of A for fixed S: 


A = argmin - 

A 2 


X-AS 


obtained with A = XSF, where the symbol denotes 
the Moore-Penrose pseudo inverse. 

• Update of S for fixed A: 


argrmn ; 


X-AS 


■ FA 

3 = 1 


approximated with the soft-thresholding S = iSa^A^X^, 


where 


S A (.Atx]=sign(AtX) iJ ©max(0,|(AtX) i) ,AA), 
where the operator © stands for the entrywise Hadamard 
product. 


In practice, the estimation of the mixing matrix is enhanced 
by using a decreasing threshold strategy. By starting with a 
large value of A, we only select the largest entries of the 
sources. These large coefficients are weakly influenced by 
the Gaussian noise and above all, are very likely to belong 


to only one source (MDP): they are the most discriminant 
samples for the source separation. The sources are then refined 
by decreasing the value of A towards 3cr, where a denotes 
the standard deviation of N. This final threshold ensures 
with a high probability that no Gaussian noise contaminates 
the sources. The noise standard deviation can be estimated 
using robust empirical estimators such as the median absolute 
deviation (MAD). Besides, it has been emphasized that the 
decreasing threshold strategy can prevent GMCA to be trapped 
into local minima 0. 


III. Robust GMCA 

The rationale of the proposed separation procedure relies on 
the difference of structure between the outliers and the term 
AS. Indeed, the outliers are assumed to be in general position 
while the source contribution AS tends to naturally cluster 
along the directions described by the columns of the mixing 
matrix A. Following (20) , minimizing the i\ norm of the 
sources tends to favor solutions A so that the corresponding 
S are clustered along such axes. This motivates the use of 
sparsity to provide an efficient separation scheme. In the 
following, we therefore introduce a new sparsity-enforcing 
BSS method based on the GMCA framework. 


A. A naive extension 


A straightforward strategy to account for the presence of 
outliers in the framework of GMCA is done by including an 
extra sparse term O enforcing the sparsity of the outliers. This 
approach, which we coin Naive robust GMCA (NrGMCA), 
can be formulated as: 


argmm ■ 
0,A,S 


X - AS - O 


FA 

3 = 1 


a 


O 


where the first term is the data-fitting term, well suited to 
deal with the Gaussian noise N, and the two others terms 
enforce the sparsity of the sources and the outliers. 

However, this first naive approach cannot handle large outliers, 
especially if O have several active entries per row or column. 
Indeed, in cases where the outliers are the dominant contribu¬ 
tion to the data, the MDP does not obviously hold since the 
largest entries of the data are related to the outliers and not to 
individual sources. The outliers are then likely to be estimated 
as sources, misleading the estimation of A. 


B. The rGMCA algorithm 

In the presence of large outliers, as the MDP does not 
hold, discriminating between the O and AS becomes more 
challenging and requires at least improving the robustness of 
the estimation of the mixing matrix against the influence of 
the outliers. For this purpose, we propose to extend NrGMCA 
building upon the AMCA algorithm (ZD 

In a different context, the AMCA algorithm extends the 
GMCA in the special case of sparse and partially correlated 
sources, where the MDP does not hold either (ZD- In brief, 
this method relies on an iterative weighting scheme that 
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penalizes non-discriminant entries of the sources. Inspired by 
this approach, we propose to implement a similar weighting 
scheme to penalize samples that are likely to be contaminated 
with large outliers. In the spirit of AMCA, the influence of 
the corrupted samples are weakened by using the following 
weighting scheme in the mixing matrix update stage: 


argmm - 
6 A,S : 


(X - AS - 6)W 


3=1 


+ a 


O 


where W is the penalizing diagonal matrix of size t x t. The 
role of the weighting procedure is to penalize the samples 
of the sources that are likely to be corrupted with outliers. 
It is therefore natural to define the weights W based on 
the current estimate of the outlier matrix O. In the spirit of 
ED an efficient weighting procedure consists in defining the 
weights based on the sparsity level of the columns of the 


outlier matrix as follows: W a = 


e-H O* 


where 6 stands 


for 


median |S( M) l>ol' S '(M)l 
10 


Subsequently, the penalization of a 
given data sample will increase with the amplitude of the 
outliers as well as the number of outliers per data sample. 
Following the structure of the GMCA algorithm, this problem 
is solved by using BCR and alternative projected least squares. 
The structure of the algorithm is presented in Algorithm 1. 
Instead of estimating the variables one by one, we found 
that applying GMCA to the current estimate of X — O and 
then to estimate the outliers from X — AS provides the most 
effective estimation procedure. Indeed, jointly re-estimating 
A and S from a new estimate of the outlier-cleaned data is 
more likely to limit the impact of remaining outliers on the 
source estimation. 


C. Choice of parameters and initialization 

The large outliers are the most damaging as they can 
severely mislead the estimation of A if they are not esti¬ 
mated as outliers from the start. That is why, the algorithms 
NrGMCA and rGMCA start by estimating the largest values 
of X as outliers. On the other hand, the orientation of the 
mixing matrix A, which is initialized as a random matrix 
whose columns are normalized and entries are Gaussian, is 
deduced from the remaining large outliers cleaned data X —O: 
our first estimation of O should not be too conservative 
to keep the clustering aspect of X — O. For this purpose, 
we propose to estimate O with a soft-thresholding at the 
value a 0 = median| X |> 3 cr|X|. Then, similarly to GMCA, 
the sources are estimated as being the largest components in 
A^(X — O), determining the corresponding A 0 . Furthermore, 
this initial value A 0 is fixed at the beginning of the algorithm. 
By keeping a large value of A 0 , we minimize the risk to 
propagate the errors since only few coefficients from the last 
estimate of S are kept. 

Then the decreasing threshold strategy used for A in GMCA 
is similarly kept. Likewise, the parameter a is also decreasing 
towards 3cr to refine O without incorporating too many terms 
of AS or Gaussian noise. 


Algorithm 1: rGMCA 

Initialize 6°, S°, A 0 , W°, a °, and A 0 . 

while k < K do 
while j < J do 

S fc ’i = S^A^” 1 )* (X - O fc_1 ) 

A k ’i = (X - 6 fe - 1 )W fe (S fe ’JW fe ) t/ 
Decrease A J 

end while 

O fc = <S a *(x - A fe > J - 1 S fe ’ J - 1 ) 

Update 
Decrease a k 

end while 

return 6 K \ S*" 1 ” 7 " 1 , 


IV. Numerical experiments 

In this section, we compare the performances of rGMCA 
with standard BSS methods: GMCA (5), PCP+GMCA (the 
outliers are first estimated with PCP and then discarded from 
X |[16)) , the minimization of the /3-divergence with statistical 
independence prior (implementation from (22)) and NrGMCA. 
The values of the free parameters for PCP (A in (Tbj ) and the 
/3-divergence minimization algorithm (/3) are tuned to return 
the best results based on several trials. 

The minimization of the /3-divergence only returns an estimate 
for A, and thus we propose to compare the methods by using 

a criterion depending only on A 151: . The 

median of obtained from 80 Monte-Carlo simulations and 
the percentage of these 80 simulations with an error smaller 
than 5 x 10 -3 (normalized to represent a probability of success) 
are represented in the following figures. The probability of 
success is an interesting indicator of the robustness of the 
algorithms, showing whether they reliably perform well. 

Last, the outliers are composed of both broadly distributed 
errors and entirely corrupted columns (anomalies appearing at 
a same position in each observation) with random amplitudes, 
in order to cover a large variety of noise patterns. 


Influence of the amplitude of the outliers 

We create m = 16 measures of n = 8 sources with 
t = 1024 samples to which a Gaussian noise is added. 
These sources are drawn from a Bernoulli-Gaussian law with 
parameter of activation 0.05 and are scaled so that the largest 
entry is equal to 100. The entries of A are drawn from a 
centered normal law, and each column of A is then normal¬ 
ized. The positions of the outliers are such that: 160 are drawn 
uniformly at random and 10 columns are entirely corrupted. 
Their amplitudes are Gaussian with standard deviation chosen 
according to the axis of fig[l] 

One can notice than the weighting procedure clearly en¬ 
hances the estimation of A: rGMCA outperforms NrGMCA, 
especially when the outliers are large fi g[l] Contrary to the 
others methods, the combination PCP+GMCA and rGMCA 
are almost not influenced by the amplitudes of the outliers 
and are very likely to return good estimates of AfigEl 
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Figure 1. A a versus the amplitudes of the outliers. 



Figure 2. Probability of success versus the amplitude of the outliers. 

Influence of the number of outliers 

The components A, S and N are similar to the ones 
presented in the previous subsection. The percentage of grossly 
corrupted entries is fixed according to the axis of fig(3] Half 
of the outliers are drawn uniformly at random and the others 
correspond to the entirely corrupted columns. Their amplitudes 
are Gaussian with a standard deviation of 100. 



Percentage of corrupted entries 


Figure 3. A a versus the percentage of data corrupted by outliers. 



Figure 4. Probability of success versus the percentage of data corrupted by 
outliers. 

All the methods are equivalent for without outliers. 
The estimations of A obtained with PCP+GMCA, GMCA, 
NrGMCA and the ^-divergence are hampered quite quickly 
by the percentage of corruption of the data fig (3] On the other 
side, rGMCA is robust while the percentage of corruption is 
moderated fig(4] . 

NMR spectrum identification and influence of the number of 
observations 

In this subsection, we evaluate the algorithms in the field of 
the biomedical engineering with simulated data. We propose 
to separate the different Nuclear Magnetic Resonance spectra 
of a simulated mixture which can represent the data provided 
in NMR spectroscopy. By performing BSS on the mixture of 


spectra, we should be able to identify the different molecules 
of the mixture (23J. 

The estimated NMR spectrum of the menthone, the folic 
acid, the ascorbic acid and the myo-inositol from SDBS^are 
convolved with a Laplacian kernel of 2-samples width at half 
maximum (implementation from pyGMCAQj). The number of 
observations is set according to fig(5] . The Gaussian noise 
N is drawn from a Gaussian law with a standard deviation 
of 0.1. The outliers are drawn from a Gaussian law with 
standard deviation 10 3 , so that 20 columns and 1% of the 
entries, broadly distributed, are corrupted. 



Figure 5. A a versus the number of observations. 

Despite the effectiveness of rGMCA and PCP+GMCA if 
m^> n fig(5j none of these algorithms are able to handle the 
outliers if m = n. The algorithm rGMCA is the only one that 
provides a correct estimate of A for m > 5 by means of the 
weighting scheme. However, rGMCA cannot clearly separate 
the original outliers from the estimated sources, fig® 



ppm ppm 


Figure 6. Estimates of the menthone’s NMR spectrum with rGMCA (top of 
the images) and PCP+GMCA (bottom of the images), with ra = 6. On the 
left: estimated spectrum in red and reference in blue-dashed lines - zoomed 
in on the support of the reference. On the right: magnitude of the difference 
between the estimate and the reference. 

V. Conclusion 

We introduce a novel method to separate sparse sources in 
the presence of outliers. The proposed method relies on the 
joint sparsity-based separation of the outliers and the sources. 
This strategy allows us to implement a weighting scheme that 
penalizes corrupted data samples, which is shown to highly 
limit the impact of the outliers on the estimated sources. 
Numerical experiments demonstrate the good and consistent 
performances of our algorithm to robustly estimate the mixing 
matrix. Future work will focus on generalizing the proposed 
approach to enforce sparsity in a transformed domain. 

1 http://sdbs.db.aist.go.jp 

2 http://www.cosmostat.org/software/gmcalab/ 
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